Goto

Collaborating Authors

 generating diverse high-fidelity image



Author Response for " Generating Diverse High-Fidelity Images with VQV AE-2"

Neural Information Processing Systems

We thank the reviewers for the detailed and constructive feedback. R2 had positive remarks about the significance of our method and the thoroughness of our evaluation. We believe these clarifications will resolve all reviewers' concerns, but would be R1 - Interpolations: There is indeed no simple way to do interpolations, which We will clarify in the final version. We also implemented incremental sampling (as in Paine et al. arxiv.org/abs/1611.09482) Average version which does not use stop-gradients.



Reviews: Generating Diverse High-Fidelity Images with VQ-VAE-2

Neural Information Processing Systems

This paper presents great visual images and quantitative scores for an autoencoder-based generative model. All reviewers agree on this aspect, and this is primarily the reason why acceptance is warranted. Certainly an AE pipeline with this capability is a worthwhile contribution to the community. However, the proposed method is mostly some engineered enhancements to the basic VQ-VAE model that has already been published. Moreover, full architectural details and hyperparamter settings were not provided in the original submission but were promised for the final version.


Reviews: Generating Diverse High-Fidelity Images with VQ-VAE-2

Neural Information Processing Systems

In addition, the model also inherit the nice property of AE-based models that it does not suffer from the mode collapse issue. However, it seems to me that the only difference between this paper and the VQ-VAE paper is that this work introduces the hierarchical structure to learn different levels of latent representations and priors. The novelty looks a bit low. In addition, this paper didn't provide any idea about why such a design can make the generative performance better. The loss function (2) is not a reasonable objective to optimize considering the stop gradient operator. During the optimization procedure, the loss function may increase by taking a step in the gradient directions.


Generating Diverse High-Fidelity Images with VQ-VAE-2

Neural Information Processing Systems

We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where the encoding and/or decoding speed is critical. Additionally, VQ-VAE requires sampling an autoregressive model only in the compressed latent space, which is an order of magnitude faster than sampling in the pixel space, especially for large images. We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN's known shortcomings such as mode collapse and lack of diversity.


Generating Diverse High-Fidelity Images with VQ-VAE-2

Razavi, Ali, Oord, Aaron van den, Vinyals, Oriol

Neural Information Processing Systems

We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where the encoding and/or decoding speed is critical. Additionally, VQ-VAE requires sampling an autoregressive model only in the compressed latent space, which is an order of magnitude faster than sampling in the pixel space, especially for large images. We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN's known shortcomings such as mode collapse and lack of diversity.


Generating Diverse High-Fidelity Images with VQ-VAE-2

Razavi, Ali, Oord, Aaron van den, Vinyals, Oriol

arXiv.org Machine Learning

We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where the encoding and/or decoding speed is critical. Additionally, VQ-VAE requires sampling an autoregressive model only in the compressed latent space, which is an order of magnitude faster than sampling in the pixel space, especially for large images. We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN's known shortcomings such as mode collapse and lack of diversity.